Cross Language Information Integration Bridging the Gap
نویسنده
چکیده
Integrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of direct translations of text. Comparable coprora have important properties that can be exploited to infer word translations without a bilingual dictionary. The main premise underlying comparable corpora is that translations of two co-occurring words in a source language also co-occur in the target language. Past work has made use of this property for directly extracting a lexicon from comparable corpora. A major drawback of this work is the number of parameters (translation probabilities) to be estimated which increases quadratically with the vocabulary sizes. In this paper, we propose a novel cluster based approach that tries to map groups of related words, rather than the individual words themselves. This method has the advantage that the number of parameters to be estimated remains indepdendent of the vocabulary size. Experiments show that the computational demands of this problem are really very high and the method seems to work well for small data sets. Approximating the multinomials produced by the clustering algorithm (LDA) to speed up computation degrades the performance of the model.
منابع مشابه
Cross border E-Science and Research Partnership: Bridging the Gap Between Science and Media
E-Science is a tool that helps scientists to store, interpret, analyze and make a network of their data, and it can play a critical role in different aspects of the scientific goals and research. This commentary, under the topic of Cross Border E-Science and Research Partnership: Bridging the Gap between Science and Media,[1] attempts to shed light on E-Science with emphasis on three importa...
متن کاملThe Impact of Skill Integration on Task Involvement Load
The present study investigated whether word learning and retention in a second language are contingent upon a task's involvement load, i.e., the amount of need, search, and evaluation the task imposes. Laufer and Hulstijn (2001) contend that tasks with higher degrees of these three components induce higher involvement load, and are, therefore, more effective for word learning. To test this clai...
متن کاملCross Language Information Retrieval: an Experiment in Bilingual News Article Alignment from the Internet using MT
Cross Language Information Retrieval (CLIR) o ers the potential for users to search document collections in foreign languages. This is particularly relevant now that the Internet has become a global information source. Machine translation (MT) has a key role in bridging the gap between the language of the users' query and that of the document collection as well as to help the user understand th...
متن کاملIncreasing the Effectiveness of Russian Language Teaching for Special Purposes (to the Problem of Integration of Language Training with Information Technology Courses)
The article is devoted to the problem of increasing the efficiency of language teaching for the special purposes of foreign students in studying Russian at a technical university. Particular attention is paid to the training of foreign students in the skills of working with information using the latest computer technology. The conclusions of the work are based on the analysis of the results of ...
متن کاملScholarship and practice: the contribution of ethnographic research methods to bridging the gap
Introduction Research methods are the means by which knowledge is acquired and constructed within a discipline. Research methods need to be both relevant and rigorous in order to be accepted as legitimate within a particular field of knowledge. Information systems (IS) is a field which has multiple stakeholders in its knowledge development, operating in contexts which have to deal with multipli...
متن کاملImage-Language Association: are we looking at the right features?
The ever growing popularity and availability of multimedia information has rendered automatic image-language association essential in a number of multimedia integration applications. Bridging the gap between the two media requires an appropriate feature-set for describing their common reference; one that will be both distinctive of the entities referred too and feasible to extract automatically...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006